Selecting effective index terms using a decision tree

نویسندگان

  • Takenobu Tokunaga
  • Kenji Kimura
  • Hironori Ogibayashi
  • Hozumi Tanaka
چکیده

This paper explores the effectiveness of index terms more complex than single words used in conventional information retrieval systems. Retrieval is performed in two phases. In the first phase, a conventional retrieval method (the Okapi system) is used and in the second phase, complex index terms such as syntactic relations and single words with part of speech information are introduced to rerank the results of the first phase. The effectiveness of the different types of index terms were evaluated through experiments, in which the TREC-7 test collection and 50 queries were used. The experiments showed that retrieval effectiveness was improved for 32 out of 50 queries. Based on this investigation, we introduced a method to select effective index terms by using a decision tree. Experiments with the same test collection showed that retrieval effectiveness was improved in half of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Forecasting Of Tehran Stock Exchange Index by Using Data Mining Approach Based on Artificial Intelligence Algorithms

Uncertainty in the capital market means the difference between the expected values ​​and the amounts that actually occur. Designing different analytical and forecasting methods in the capital market is also less likely due to the high amount of this and the need to know future prices with greater certainty or uncertainty. In order to capitalize on the capital market, investors have always sough...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

Steel Buildings Damage Classification by damage spectrum and Decision Tree Algorithm

Results of damage prediction in buildings can be used as a useful tool for managing and decreasing seismic risk of earthquakes. In this study, damage spectrum and C4.5 decision tree algorithm were utilized for damage prediction in steel buildings during earthquakes. In order to prepare the damage spectrum, steel buildings were modeled as a single-degree-of-freedom (SDOF) system and time-history...

متن کامل

Comparison of Error Tree Analysis and TRIPOD BETA in Accident Analysis of a Power Plant Industry Using Hierarchical Analysis

Introduction: Due to the importance and necessity of accident analysis, it is necessary to use proper technique for precise accident analysis and to provide corrective and preventive measures to prevent recurrence of an accident. Method: In this descriptive-analytical paper, the most important criteria for investigating and selecting accident investigation and analysis techniques and selecting...

متن کامل

Evaluation of liquefaction potential based on CPT results using C4.5 decision tree

The prediction of liquefaction potential of soil due to an earthquake is an essential task in Civil Engineering. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Natural Language Engineering

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2002